Fuzzy rough classifiers for class imbalanced multi-instance data

نویسندگان

  • Sarah Vluymans
  • Dánel Sánchez Tarragó
  • Yvan Saeys
  • Chris Cornelis
  • Francisco Herrera
چکیده

In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, classification is degraded. Many multi-instance classifiers have been proposed, but few take into account the possibility of class imbalance, which causes them to fail in this situation. In this paper, we propose a new type of classifier that embodies a solution to the multi-instance class imbalance problem. Our proposal relies on the use of fuzzy rough set theory. We present two families of classifiers respectively based on information extracted at bag-level and at instance-level. We experimentally show that our algorithms outperform state-of-theart solutions to multi-instance imbalanced data classification, evaluated by the popular metrics AUC and geometric mean. & 2015 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm

For any electric power system, it is crucial to guarantee a reliable performance of its High Voltage Circuit Breaker (HCVB). Determining when the HCVB needs maintenance is an important and non-trivial problem, since these devices are used over extensive periods of time. In this paper, we propose the use of data mining techniques in order to predict the need of maintenance. In the corresponding ...

متن کامل

Combining rough sets and rule based classifiers for handling imbalanced data

The paper presents two rough sets based filtering approaches combined with rule based classifiers suited for handling imbalanced data sets, i.e., data sets where the minority class of primary importance is under-represented in comparison to the majority classes. We introduced two techniques to detect and process inconsistent majority cases in the boundary between the minority and majority class...

متن کامل

Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection

The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such...

متن کامل

Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data

In this paper, we present a prototype selection technique for imbalanced data, Fuzzy Rough Imbalanced Prototype Selection (FRIPS), to improve the quality of the artificial instances generated by the Synthetic Minority Over-sampling TEchnique (SMOTE). Using fuzzy rough set theory, the noise level of each instance is measured, and instances for which the noise level exceeds a certain threshold le...

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2016